Skip to content

Conversation

@mattwthompson
Copy link
Member

@mattwthompson mattwthompson commented Oct 21, 2025

$ cp submissions/2024-11-07-Sage-2.1.0/yds.yaml submissions/2025-10-20-Sage-2.1.0/input.yaml

Submission Checklist

  • Created a new directory in the submissions directory containing the YDS input file and optionally a force field .offxml file
  • Triggered the benchmark workflow with a PR comment of the form /run-optimization-benchmarks path/to/submission/input.yaml or /run-torsion-benchmarks path/to/submission/input.yaml
  • Waited for the workflow to finish and a comment with Job status: success to be posted
  • Reviewed the results committed by the workflow
  • Published the corresponding Zenodo entry and retrieved the DOI
  • Added the Zenodo DOI to the table in the main README
  • Ready to merge!

@mattwthompson
Copy link
Member Author

/run-optimization-benchmarks submissions/2025-10-20-Sage-2.1.0/input.yaml

@github-actions
Copy link

A workflow has been dispatched to run the benchmarks for this PR.

  • Run ID: 18672934811
  • Triggering actor: github-actions[bot]
  • Target branch: rerun-sage-2.1.0

@github-actions
Copy link

A workflow dispatched to run optimization benchmarks for this PR has just finished.

@mattwthompson
Copy link
Member Author

I'm seeing some differences between this run and submissions/2024-11-07-Sage-2.1.0. They nearly don't show up visually, but do come through in the statistics.

Summary statistics for rmsd differences:
       2024-11-07-Sage-2.1.0  2025-10-20-Sage-2.1.0      abs_diff
count           64474.000000           64474.000000  64474.000000
mean                0.281187               0.281247     -0.000060
std                 0.274567               0.275336      0.086656
min                 0.000000               0.000000     -2.855092
25%                 0.118493               0.118378     -0.005638
50%                 0.191289               0.191362      0.000000
75%                 0.337260               0.336936      0.005735
max                 3.406551               4.095735      2.903419
Out of 64474 entries:
23740 entries have (absolute) difference greater than 0.01
5438 entries have (absolute) difference greater than 0.05
2331 entries have (absolute) difference greater than 0.1
364 entries have (absolute) difference greater than 0.5
101 entries have (absolute) difference greater than 1.0
Summary statistics for dde differences:
       2024-11-07-Sage-2.1.0  2025-10-20-Sage-2.1.0      abs_diff
count           54653.000000           54653.000000  5.465300e+04
mean               -0.745372              -0.745930  5.575600e-04
std                 3.399927               3.400385  4.248813e-01
min              -102.205111            -102.190539 -1.893245e+01
25%                -2.011216              -2.007744 -1.273394e-02
50%                -0.378216              -0.376148  3.304079e-12
75%                 0.800524               0.799714  1.300617e-02
max                96.402958              96.378667  1.524444e+01
Out of 64006 entries:
5504 entries have (absolute) difference greater than 0.1
1354 entries have (absolute) difference greater than 0.5
706 entries have (absolute) difference greater than 1.0
82 entries have (absolute) difference greater than 5.0

Both RMSD results on the same plot:
image

Distribution of RMSD differences:
image

Both DDE results on the same plot:
image

Distribution of DDE differences:
image

Here's the code I used to generate these plots and statistics, which I ran from this branch

@mattwthompson mattwthompson marked this pull request as ready for review October 21, 2025 15:12
@lilyminium
Copy link
Collaborator

From a quick look the differences here look probably fine, and I think (caveated with the note in the next sentence) we should merge it so Chapin has a more up-to-date comparison for the protein FF benchmarks. The only note I have is that you'd ideally want to be using the unconstrained version.

I'll leave some of my working below since this was a quick-and-dirty skim. A more rigorous check would actually compare the geometries between the two runs, instead of (here) comparing the difference from QM.

I visually checked the molecules with the highest RMSDs differences, which are long and floppy. While we expect that flexible molecules can sometimes slide into a different minimum with minor differences in optimization steps, affecting the torsions, I'd expect bonds and angles to remain relatively inflexible. Bond ICRMSD differences range up to 0.005 at the worst. The majority are very low in magnitude. The outlier points around 0.2 remain outlier points.
bond_rmsd

If I had more than 5 min I'd be curious which bond/s are contributing to the molecules with the highest differences in bond ICRMSD between runs (highest bond RMSD shown below) and look at geometries.
Screenshot 2025-10-22 at 5 52 18 pm

Same goes for angles, differences range up to 1 degree difference.
angle_rmsd

Again if I had more time I'd wonder what's going on with this seemingly uncomplicated molecule.
Screenshot 2025-10-22 at 6 02 59 pm

About 1.2k conformers had the exact same RMSD in both runs, leading me to think they probably minimized to the same structure. Checking the ddEs for these conformers (~750 of which didn't have nan energies) showed some differences ranging -0.4 to 0.4 (kcal/mol?). It's likely this comes from the conformer minimum being different in geometry, although hard to guarantee without checking. 452 conformers had ddE differences < 1e-6 though, and 601 < 1e-3, which seems reasonable.

dde_difference

@mattwthompson
Copy link
Member Author

That's a lot for 5 minutes! I've done much less in much more time this morning.

Here's a subset of environment differences, none of which should be a smoking gun:

{'openeye-toolkits': ('2024.1.3', '2025.1.1'),
 'openff-amber-ff-ports': ('0.0.4', '2025.09.0'),
 'openff-forcefields': ('2024.09.0', '2025.10.0'),
 'openff-interchange': ('0.4.0', '0.4.8'),
 'openff-interchange-base': ('0.4.0', '0.4.8'),
 'openff-qcsubmit': ('0.53.0', '0.57.0'),
 'openff-toolkit': ('0.16.5', '0.17.1'),
 'openff-toolkit-base': ('0.16.5', '0.17.1'),
 'openff-units': ('0.2.2', '0.3.1'),
 'openff-utilities': ('0.1.12', '0.1.16'),
 'openmm': ('8.1.2', '8.3.1'),
 'rdkit': ('2024.03.5', '2025.03.6')}

The automation does a conda env export which is good for tracking what was run but doesn't make it easy to use that environment, especially on a different platform. It might be easier to use a tool like conda lock instead

I've pulled out a few molecules which get high ICRMSD differences but it's hard to draw conclusions quickly. I will pick this up later.

Otherwise I've

@mattwthompson
Copy link
Member Author

For now I will follow your recommendation to keep this moving along. There's lots we can do to make these analyses easier; the barrier here was higher than I hoped it to be

@mattwthompson mattwthompson merged commit 33ffe7c into main Oct 22, 2025
@mattwthompson
Copy link
Member Author

mattwthompson commented Nov 11, 2025

Not sure where to put this analysis, but for now I'll just infodump on a few molecules I looked at

Key take-aways

  • Sulfonamides show up frequently in the worst RMSDs - both QM vs. MM and run-to-run MM
  • Most of the YDS run-to-run variance can be explained by "random" conformational changes
  • Nitrogens serving like hinges between planar groups also show up some

To get another picture of the sort of differences in these YDS runs, here's a .describe() of a table showing only the differences between valence RMSDs (units are A and degree):

statistic Bond_diff Angle_diff Dihedral_diff Improper_diff
count 64474.0 64474.0 64468.0 64254.0
null_count 0.0 0.0 6.0 220.0
mean 0.000043 0.008894 0.399945 0.083824
std 0.000077 0.023497 1.55017 0.348042
min 0.0 0.0 0.0 0.0
25% 0.00001 0.001616 0.031341 0.007912
50% 0.000026 0.004241 0.106133 0.025028
75% 0.000053 0.008939 0.279522 0.065661
max 0.004998 0.906306 41.369588 11.990329

I'm somewhat (?) reassured that the order of magnitude of bond differences is quite small and even the top quartile is not so bad. The order of magnitude of angle differences also seems good, even the maximum value. Torsion differences are more suspect in the middle and very suspect on the high end - so far, these seem to be where minimizing to a different conformer shows up quantitatively.

Generally suspicious molecules

36966572

This molecule is just ... sorta cursed. I don't know if phosphates truly like to comprise part of a (7-membered) ring but the 3-D structure is quite poor, mostly at the phosphate group. Here are bonds labeled by how much bonds diff (between QM and MM) when the difference is more than 0.02 Angstroms:

image

This molecule has the worst bond RMSD in (both) 2.1.0 runs and was the top of discussion in one of Chapin's bug reports.

This is not necessarily relevant to the reproducibility of YDS runs, but notable.

@mattwthompson
Copy link
Member Author

Molecules with suspiciously different bond RMSDs

Key take-aways:

  • The two worst offenders had significant conformational change
  • S-N bond lengths (but not S-O or N-R) in sulfonamides are bad and probably drive the numerical differences
  • Non-aromatic rings like to flop about (okay, not breaking news ...)

37015581

This is the first one you looked at above, it has the biggest difference in bond ICRMSDs between YDS runs. The molecule doesn't look particularly exotic, but there's an N+ in the middle that's causing some issues, multiple bonds > 0.02 A from QM.

2024:
image

2025:
image

The two YDS runs minimized to different conformers - last year's run wasn't too far away from QM (0.2 A RMSD) but this year's run was a bit more (0.6 A). This year's run minimized it to a different conformer; the (non-planar in QM) ring that the N+ is a part of underwent an inversion that cause the other ring (the one with one OH, not two OHs) to rotate a little bit. DDEs are similar (1.919324 before vs. 1.929937 now) suggesting to me that the new conformer it found is okay. With some magic it might be possible to match these "new" conformers to existing ones, but I didn't do that.

43427452

This is the second-biggest run-to-run difference in bond lengths and is also due to finding a different conformer. The old results aren't great (1.0 A RMSD vs QM and several unhappy bonds):

image

but you can see that the aromatic ring hanging off of the cursed non-aromatic ring goes from one location (2024):
image

all the way to a separate one:
image

probably doing some pi-pi stacking on those aromatic rings (ddE decreased by about 10 kcal/mol).

36971754

This is the third-worst difference in bond RMSDs. There isn't a huge structural difference visually, just a bit of rotation of the 5-membered ring and two hydrogens on the on N on the other side of the molecule. Here's the "bad bonds" image for the 2025 run:

image

The 2024 run is about the same except the S-N bond is -0.0671. I'm guessing that there's a delicate balance between the proton/nitrogen geometry and electrostatics that an optimizer might not always agree with itself on. This is hand-wavy but both S-N bonds are bad and I'm not so alarmed that they're somewhat differently-bad.

36960186

This is the last run-to-run bond RMSD diff above 0.003. There's definitely a conformational change of the blob of non-aromatic rings (closer to the viewer in 3D, on the left in 2D), maybe a bit of them flopping around but definitely the trivalent nitrogen cet

2024 (0.3804 A RMSD overall vs. QM):

image image

2025 (1.2883 A):

imageimage

I notice that the S-N bond is quantitatively suspect in both, but not in agreement run-to-run.

37008583

The least exciting of them so far. The 3D structure is not so bad; QM-vs-MM overall RMSDs are 0.4585 and 0.4494 so I won't dwell on the 3D structure. The story is similar to before: S-N bonds are bad and not numerically consistent run-to-run:

2024:
image

2025:
image

Curious how the S-N bonds in the ring are nearly identical but the ones on the right is not.

@mattwthompson
Copy link
Member Author

Here's the notebook I've been using - in its current state, tidied up only partially, and note production-quality work: Compare.ipynb.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants